On overfitting and asymptotic bias in batch reinforcement learning with partial observability
نویسندگان
چکیده
This paper stands in the context of reinforcement learning with partial observability and limited data. In this setting, we focus on the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data), and theoretically show that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. Our analysis relies on expressing the quality of a state representation by bounding L1 error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations. Finally, we also discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting.
منابع مشابه
LSTD on Sparse Spaces
Efficient model selection and value function approximation are tricky tasks in reinforcement learning (RL), when dealing with large feature spaces. Even in batch settings, when the number of observed trajectories is small and the feature set is high-dimensional, there is little hope that we can learn a good value function directly based on all the features. To get better convergence and handle ...
متن کاملManifold Embeddings for Model-Based Reinforcement Learning under Partial Observability
Interesting real-world datasets often exhibit nonlinear, noisy, continuous-valued states that are unexplorable, are poorly described by first principles, and are only partially observable. If partial observability can be overcome, these constraints suggest the use of model-based reinforcement learning. We experiment with manifold embeddings to reconstruct the observable state-space in the conte...
متن کاملDeep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability
Many real-world tasks involve multiple agents with partial observability and limited communication. Learning is challenging in these settings due to local viewpoints of agents, which perceive the world as non-stationary due to concurrentlyexploring teammates. Approaches that learn specialized policies for individual tasks face problems when applied to the real world: not only do agents have to ...
متن کاملLearning with Partial Observations in General-sum Stochastic Games
In many situations, multiagent systems must deal with partial observability that agents have in the environment. In these cases, finding optimal solutions is often intractable for more than two agents and approximated solutions are often the only way to solve these problems. The models known to represent this kind of problem is Partially Observable Stochastic Game (POSG). Such a model is usuall...
متن کاملReinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds
Perkins’ Monte Carlo exploring starts for partially observable Markov decision processes (MCES-P) integrates Monte Carlo exploring starts into a local search of policy space to offer a template for reinforcement learning that operates under partial observability of the state. In this paper, we generalize the reinforcement learning under partial observability to the self-interested multiagent se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1709.07796 شماره
صفحات -
تاریخ انتشار 2017